This workshop is beginner-level introduction to programming in R. The course is designed to be taught in two sessions of 3 hours each and is focused on the application of R to the analysis of tabular data from clinical trials.
1 The basics
Learning objectives
- Become familiar with the language and the logic behind it
- Create a project in R studio
- Configure the working directory
- Create your first R script
- Get fluent in R using the console
- Compute arithmetic operations
- Use logical operators on variables
- Learn how to ask for help
- Get comfortable installing packages
1.1 Installing packages
There are multiple sources and ways to do this.
CRAN
install.packages(c("dplyr","ggplot2","gapminder","medicaldata"))BioConductor
For more details about the project you can visit https://www.bioconductor.org
To install packages from BioConductor you first need to install BioConductor itself.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager","https://stat.ethz.ch/CRAN/")
BiocManager::install(version = "3.15")Then you can install any package you want by using the install
BiocManager::install("DESeq2")GitHub
If you want to install the development version of a package, or you are installing something that is only available on GitHub you can use devtools
install_github('andreacirilloac/updateR')1.2 Syntax
Comments
# This is a comment line Accessing content
letters[1]## [1] "a"
letters[2]## [1] "b"
head(iris$Sepal.Length)## [1] 5.1 4.9 4.7 4.6 5.0 5.4
1.3 Aritmetic Operations
# Additon
2+2## [1] 4
# Subtraction
3-5## [1] -2
# Multiplication
71*9## [1] 639
# Division
90/3## [1] 30
# Power
2^3## [1] 8
1.4 Creating variables
# The convention is to use left hand assignation
var1 <- 12
var2 <- "hello world"
var1## [1] 12
var2## [1] "hello world"
# It is also possible to use the '=' sign, but is not a good practice
var1 = 12
var2 = "hello world"
var1## [1] 12
var2## [1] "hello world"
1.5 Logical operators
# First create two numeric variables
var1 <- 35
var2 <- 27# Equal to
var1 == var2## [1] FALSE
# Less than or equal to
var1 <= var2## [1] FALSE
# They also work with other classes
var1 <- "mango"
var2 <- "mangos"var1 == var2## [1] FALSE
Strings are compared character by character until they are not equal or there are no more characters left to compare.
var1 < var2## [1] TRUE
We can test if a variable is contained in another object
"c" %in% letters## [1] TRUE
"c" %in% LETTERS## [1] FALSE
1.6 Seeking help
Concatenate function
?c()Print the description of an object
str(iris)## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
2 Data types and data structures
Learning objectives
- Understand the differences between classes, objects and data types in R
- Create objects of different types
- Subset and index objects
- Learn and use vectorized operations
2.1 Vectors
Key points:
- Can only contain objects of the same class
- Most basic type of R object
- Variables are vectors
2.1.1 Numeric
They store numbers as double, and it is stored with decimals. The term double refers to the number of bytes required to store it. Each double is accurate up to 16 significant digits.
Creating a numeric vector using c()
x <- c(0.3, 0.1)
x## [1] 0.3 0.1
Using the vector() function
x <- vector(mode = "numeric",length = 10)
x## [1] 0 0 0 0 0 0 0 0 0 0
Using the numeric() function
x <- numeric(length = 10)
x## [1] 0 0 0 0 0 0 0 0 0 0
Creating a numeric vector with a sequence of numbers
x <- seq(1,10,1)
x## [1] 1 2 3 4 5 6 7 8 9 10
x <- seq(1,10,2)
x## [1] 1 3 5 7 9
x <- rep(2,10)
x## [1] 2 2 2 2 2 2 2 2 2 2
2.1.2 Integer
They store numbbers that can be written without a decimal component.
Creating an integer vector using c()
x <- c(1L,2L,3L,4L,5L)
x## [1] 1 2 3 4 5
Creating an integer vector of a sequences of numbers
x <- 1:10
x## [1] 1 2 3 4 5 6 7 8 9 10
2.1.3 Logical
Creating a logical vector with c()
x <- c(TRUE,FALSE,T,F)
x## [1] TRUE FALSE TRUE FALSE
Creating a logical vector with vector()
x <- vector(mode = "logical",length = 5)
x## [1] FALSE FALSE FALSE FALSE FALSE
Creating a logical vector using logical()
x <- logical(length = 10)
x## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2.1.4 Character
x<-c("a","b","c")
x## [1] "a" "b" "c"
x<-vector(mode = "character",length=10)
x## [1] "" "" "" "" "" "" "" "" "" ""
x<-character(length = 3)
x## [1] "" "" ""
Some useful functions to modify strings
tolower(LETTERS)## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
toupper(letters)## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
paste(letters,1:length(letters),sep="_") # Note the implicit coercion## [1] "a_1" "b_2" "c_3" "d_4" "e_5" "f_6" "g_7" "h_8" "i_9" "j_10"
## [11] "k_11" "l_12" "m_13" "n_14" "o_15" "p_16" "q_17" "r_18" "s_19" "t_20"
## [21] "u_21" "v_22" "w_23" "x_24" "y_25" "z_26"
2.1.5 Vector attributes
The elements of a vector can have names
x<-1:5
names(x)<-c("one","two","three","four","five")
x## one two three four five
## 1 2 3 4 5
x<-logical(length = 4)
names(x)<-c("F1","F2","F3","F4")
x## F1 F2 F3 F4
## FALSE FALSE FALSE FALSE
2.1.6 Built-in functions
To inspect the contents of a vector
is.vector(x) # Check if it is a vector## [1] TRUE
is.na(x) # Check if it is empty## F1 F2 F3 F4
## FALSE FALSE FALSE FALSE
is.null(x) # Check if it is NULL## [1] FALSE
is.numeric(x) # Check if it is numeric## [1] FALSE
is.logical(x) # Check if it is logical## [1] TRUE
is.character(x) # Check if it is character## [1] FALSE
To know what kind of vector you are working with
class(x) # Atomic class type## [1] "logical"
typeof(x) # Object type or data structure (matrix, list, array...)## [1] "logical"
str(x)## Named logi [1:4] FALSE FALSE FALSE FALSE
## - attr(*, "names")= chr [1:4] "F1" "F2" "F3" "F4"
To know more about the data contained in the vector
length(x)## [1] 4
table(x)## x
## FALSE
## 4
summary(x)## Mode FALSE
## logical 4
Mathematical operations
sum(x)## [1] 0
min(x)## [1] 0
max(x)## [1] 0
mean(x)## [1] 0
median(x)## [1] 0
sd(x)## [1] 0
log(x)## F1 F2 F3 F4
## -Inf -Inf -Inf -Inf
exp(x)## F1 F2 F3 F4
## 1 1 1 1
2.1.7 Vector arithmetics
x<-1:10
y<-11:20x*2## [1] 2 4 6 8 10 12 14 16 18 20
x+y## [1] 12 14 16 18 20 22 24 26 28 30
x*y## [1] 11 24 39 56 75 96 119 144 171 200
x^y## [1] 1.000000e+00 4.096000e+03 1.594323e+06 2.684355e+08 3.051758e+10
## [6] 2.821110e+12 2.326305e+14 1.801440e+16 1.350852e+18 1.000000e+20
2.1.8 Recycling
x<-1:10
y<-c(1,2)
x+y## [1] 2 4 4 6 6 8 8 10 10 12
2.1.8.1 Exercise
Calculate the sum of the following sequence of fractions:
x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)
# n=100
sum(1/(1:100)^2)## [1] 1.634984
# n=10000
sum(1/(1:10000)^2)## [1] 1.644834
2.1.9 Indexing and subsetting
For this example, lets create a vector of random numbers from 1 to 100 of size 15.
x<-sample(x = 1:100,size = 15,replace = F)
x## [1] 79 12 28 6 90 23 96 63 67 89 64 51 76 84 46
Using the index/position
x[1] # Get the first element## [1] 79
x[13] # Get the thirteenth element## [1] 76
Using a vector of indices
x[1:12] # The first 12 numbers## [1] 79 12 28 6 90 23 96 63 67 89 64 51
x[c(1,5,6,8,9,13)] # Specific positions only## [1] 79 90 23 63 67 76
Using a logical vector
# Only numbers that are less than or equal to 10
x<10## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
x[x<=10] ## [1] 6
# Only even numbers
x%%2 == 0## [1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [13] TRUE TRUE TRUE
x[x%%2 == 0]## [1] 12 28 6 90 96 64 76 84 46
x<10## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE
x[x<=10] # Only numbers that are less than or equal to 10## [1] 6
Skipping elements using indices
x[c(-1, -5)]## [1] 12 28 6 23 96 63 67 89 64 51 76 84 46
Skipping elements using names
x<-1:10
names(x)<-letters[1:10]
x[names(x) != "a"]## b c d e f g h i j
## 2 3 4 5 6 7 8 9 10
2.1.9.1 Exercise
Find all the odd numbers in x
2.2 Lists
Key points:
- Can contain objects of multiple classes
- Extremely powerful when combined with some R built-in functions
Creating lists with different data types
l <- list(10, "hello", TRUE)
l## [[1]]
## [1] 10
##
## [[2]]
## [1] "hello"
##
## [[3]]
## [1] TRUE
Assigning names as we create the list
l<-list(title = "Numbers",
numbers = 1:10,
logic = TRUE )
l## $title
## [1] "Numbers"
##
## $numbers
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $logic
## [1] TRUE
names(l)## [1] "title" "numbers" "logic"
2.2.1 Indexing and subsetting
Using [[]] instead of []
l[[1]]## [1] "Numbers"
Using $ for named lists
l$logic## [1] TRUE
2.2.2 Built-in functions
l<-list(sample(1:100,10),
sample(1:100,10),
sample(1:100,10))
names(l)<-c("r1","r2","r3")Performing operations on all elements of the list using lapply
lsums<-lapply(l,sum)
lsums## $r1
## [1] 513
##
## $r2
## [1] 415
##
## $r3
## [1] 603
2.3 Factors
Key points:
- Useful when for categorical data
- Can have implicit order, if needed
- Each element has a label or level
- They are important in statistical modelling and plotting with ggplot
- Some operations behave differently on factors
Creating factors with factor
cols<-factor(x = c(rep("red",4),rep("blue",5),rep("green",2)),
levels = c("red","blue","green"))
cols## [1] red red red red blue blue blue blue blue green green
## Levels: red blue green
samples <- c("case", "control", "control", "case")
samples## [1] "case" "control" "control" "case"
samples_factor <- factor(samples, levels = c("control", "case"))
samples_factor## [1] case control control case
## Levels: control case
str(samples_factor)## Factor w/ 2 levels "control","case": 2 1 1 2
2.3.1 Built-in functions
Grouping elements in a vector using tapply
measurements<-sample(1:1000,6)
samples<-factor(c(rep("case",3),rep("control",3)), levels = c("control", "case"))tapply(measurements, samples, mean)## control case
## 577.3333 913.6667
2.4 Matrices
Creating a matrix full of zeros with matrix()
m<-matrix(0, ncol=6, nrow=3)
m## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
class(m)## [1] "matrix" "array"
typeof(m)## [1] "double"
Creating a matrix from a vector of numbers
m<-matrix(1:10, ncol=2, nrow=5)
m## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
2.4.1 Attributes
Names of each dimension
colnames(m)<-letters[1:2]
rownames(m)<-LETTERS[1:5]
m## a b
## A 1 6
## B 2 7
## C 3 8
## D 4 9
## E 5 10
str(m)## int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:5] "A" "B" "C" "D" ...
## ..$ : chr [1:2] "a" "b"
2.4.2 Built-in functions
To know the size of the matrix
dim(m)## [1] 5 2
ncol(m)## [1] 2
nrow(m)## [1] 5
2.4.2.1 Exercise
What do you think that length(m) will return?
2.5 Data frames
Key points:
- Columns in data frames are vectors
- Each column can be of a different data type
- A data frame is essentially a list of vectors
Creating a data frame using data.frame()
df<-data.frame(numbers=1:10,
low_letters=letters[1:10],
logical_values=rep(c(T,F),each=5))
df## numbers low_letters logical_values
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
## 4 4 d TRUE
## 5 5 e TRUE
## 6 6 f FALSE
## 7 7 g FALSE
## 8 8 h FALSE
## 9 9 i FALSE
## 10 10 j FALSE
class(df)## [1] "data.frame"
typeof(df)## [1] "list"
str(df)## 'data.frame': 10 obs. of 3 variables:
## $ numbers : int 1 2 3 4 5 6 7 8 9 10
## $ low_letters : chr "a" "b" "c" "d" ...
## $ logical_values: logi TRUE TRUE TRUE TRUE TRUE FALSE ...
Re-naming columns
colnames(df)[2]<-"lowercase"
head(df)## numbers lowercase logical_values
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
## 4 4 d TRUE
## 5 5 e TRUE
## 6 6 f FALSE
2.5.1 Indexing and subsetting
df$numbers## [1] 1 2 3 4 5 6 7 8 9 10
df["numbers"]## numbers
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
## 9 9
## 10 10
df[1,]## numbers lowercase logical_values
## 1 1 a TRUE
df[,1]## [1] 1 2 3 4 5 6 7 8 9 10
2.6 Coercion
Converting between data types with as. functions
x<-1:10
as.list(x)## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 6
##
## [[7]]
## [1] 7
##
## [[8]]
## [1] 8
##
## [[9]]
## [1] 9
##
## [[10]]
## [1] 10
l<-list(numbers=1:10,
lowercase=letters[1:10])
l## $numbers
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $lowercase
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
typeof(l)## [1] "list"
df<-as.data.frame(l)
df## numbers lowercase
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
## 7 7 g
## 8 8 h
## 9 9 i
## 10 10 j
typeof(df)## [1] "list"
2.7 Hands on: Data types
- Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior? Once you have figured it out, try to change the default. (hint: read the documentation for
matrix).
- Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
- There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return. (Hint, use the function
typeof())
- Take the list you created in 3 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”.
3 Basic data manipulation
Learning objectives
- Learn how to read/write data to/from files with different formats (.tsv, .csv)
- Familiarize with basic operations of data frames
- Index and subset data frames using base R functions
- Manipulate specific data frame columns
- Joining by columns and rows
For this section we will use the package gapminder that we installed earlier.
library(gapminder)
dim(gapminder)## [1] 1704 6
#View(gapminder)summary(gapminder$country)## Afghanistan Albania Algeria
## 12 12 12
## Angola Argentina Australia
## 12 12 12
## Austria Bahrain Bangladesh
## 12 12 12
## Belgium Benin Bolivia
## 12 12 12
## Bosnia and Herzegovina Botswana Brazil
## 12 12 12
## Bulgaria Burkina Faso Burundi
## 12 12 12
## Cambodia Cameroon Canada
## 12 12 12
## Central African Republic Chad Chile
## 12 12 12
## China Colombia Comoros
## 12 12 12
## Congo, Dem. Rep. Congo, Rep. Costa Rica
## 12 12 12
## Cote d'Ivoire Croatia Cuba
## 12 12 12
## Czech Republic Denmark Djibouti
## 12 12 12
## Dominican Republic Ecuador Egypt
## 12 12 12
## El Salvador Equatorial Guinea Eritrea
## 12 12 12
## Ethiopia Finland France
## 12 12 12
## Gabon Gambia Germany
## 12 12 12
## Ghana Greece Guatemala
## 12 12 12
## Guinea Guinea-Bissau Haiti
## 12 12 12
## Honduras Hong Kong, China Hungary
## 12 12 12
## Iceland India Indonesia
## 12 12 12
## Iran Iraq Ireland
## 12 12 12
## Israel Italy Jamaica
## 12 12 12
## Japan Jordan Kenya
## 12 12 12
## Korea, Dem. Rep. Korea, Rep. Kuwait
## 12 12 12
## Lebanon Lesotho Liberia
## 12 12 12
## Libya Madagascar Malawi
## 12 12 12
## Malaysia Mali Mauritania
## 12 12 12
## Mauritius Mexico Mongolia
## 12 12 12
## Montenegro Morocco Mozambique
## 12 12 12
## Myanmar Namibia Nepal
## 12 12 12
## Netherlands New Zealand Nicaragua
## 12 12 12
## Niger Nigeria Norway
## 12 12 12
## Oman Pakistan Panama
## 12 12 12
## (Other)
## 516
3.1 Reading/writing data
3.1.1 Text files
Writing tables to a file using write.table()
aust <- gapminder[gapminder$country == "Australia",]
write.table(aust,
file="data/gapminder_australia.csv",
sep=",")write.table(aust,
file="data/gapminder_australia.csv",
sep=",",
quote=FALSE,
row.names=FALSE)write.table(aust,
file="data/gapminder_australia.tsv",
sep="\t",
quote=FALSE,
row.names=FALSE)Other functions to write to a file
africa<-gapminder[gapminder$continent=="Africa",]
write.csv(gapminder[gapminder$continent=="Africa",],
file = "data/gapminder_africa.csv",
row.names = FALSE)
class(africa$continent)## [1] "factor"
Reading data from a file
africa<-read.csv("data/gapminder_africa.csv",sep = ",",header = T)
class(africa$continent)## [1] "character"
africa<-read.table("data/gapminder_africa.csv",sep = ",",header = T,stringsAsFactors = T)
class(africa$continent)## [1] "factor"
3.1.2 R objects
Using .RDS files
saveRDS(africa,file = "objects/africa.RDS")africa<-readRDS(file = "objects/africa.RDS")Using .RData files
americas<-gapminder[gapminder$continent=="Americas",]
save(africa,americas,file = "objects/continents.RData")load(file = "objects/continents.RData",verbose = T)## Loading objects:
## africa
## americas
3.2 Exploring data frames
3.2.1 Adding columns and rows
Individually adding columns
mean_children <- sample(1:10,nrow(aust),replace = TRUE)
aust$mean_children <- mean_children
head(aust)## # A tibble: 6 × 7
## country continent year lifeExp pop gdpPercap mean_children
## <fct> <fct> <int> <dbl> <int> <dbl> <int>
## 1 Australia Oceania 1952 69.1 8691212 10040. 3
## 2 Australia Oceania 1957 70.3 9712569 10950. 5
## 3 Australia Oceania 1962 70.9 10794968 12217. 4
## 4 Australia Oceania 1967 71.1 11872264 14526. 5
## 5 Australia Oceania 1972 71.9 13177000 16789. 3
## 6 Australia Oceania 1977 73.5 14074100 18334. 3
mean_bikes <- sample(1:4,nrow(aust),replace = TRUE) # Check what happens if they don't have the same number of rows
aust[,"mean_bikes"]<-mean_bikes
head(aust)## # A tibble: 6 × 8
## country continent year lifeExp pop gdpPercap mean_children mean_bikes
## <fct> <fct> <int> <dbl> <int> <dbl> <int> <int>
## 1 Australia Oceania 1952 69.1 8691212 10040. 3 4
## 2 Australia Oceania 1957 70.3 9712569 10950. 5 3
## 3 Australia Oceania 1962 70.9 10794968 12217. 4 2
## 4 Australia Oceania 1967 71.1 11872264 14526. 5 1
## 5 Australia Oceania 1972 71.9 13177000 16789. 3 2
## 6 Australia Oceania 1977 73.5 14074100 18334. 3 3
Combining data frames
aust <- gapminder[gapminder$country=="Australia",]
df <- data.frame(mean_children=sample(1:10,nrow(aust),replace = TRUE),
mean_bikes=sample(1:4,nrow(aust),replace = TRUE))
head(df)## mean_children mean_bikes
## 1 8 3
## 2 7 4
## 3 1 2
## 4 2 1
## 5 4 3
## 6 7 3
aust <- cbind(aust,df)
head(aust)## country continent year lifeExp pop gdpPercap mean_children mean_bikes
## 1 Australia Oceania 1952 69.12 8691212 10039.60 8 3
## 2 Australia Oceania 1957 70.33 9712569 10949.65 7 4
## 3 Australia Oceania 1962 70.93 10794968 12217.23 1 2
## 4 Australia Oceania 1967 71.10 11872264 14526.12 2 1
## 5 Australia Oceania 1972 71.93 13177000 16788.63 4 3
## 6 Australia Oceania 1977 73.49 14074100 18334.20 7 3
Individually adding rows
new_row<-list("country" = "Australia",
"continent" = "Oceania",
"year" = 2022,
"lifeExp" = mean(aust$lifeExp),
"pop" = mean(aust$pop),
"gdpPercap" = mean(aust$gdpPercap),
"mean_children" = floor(mean(aust$mean_children)),
"mean_bikes" = floor(mean(aust$mean_children))) # Why did I create it as list?
new_row## $country
## [1] "Australia"
##
## $continent
## [1] "Oceania"
##
## $year
## [1] 2022
##
## $lifeExp
## [1] 74.66292
##
## $pop
## [1] 14649312
##
## $gdpPercap
## [1] 19980.6
##
## $mean_children
## [1] 4
##
## $mean_bikes
## [1] 4
aust<-rbind(aust,new_row)
tail(aust)## country continent year lifeExp pop gdpPercap mean_children
## 8 Australia Oceania 1987 76.32000 16257249 21888.89 5
## 9 Australia Oceania 1992 77.56000 17481977 23424.77 3
## 10 Australia Oceania 1997 78.83000 18565243 26997.94 2
## 11 Australia Oceania 2002 80.37000 19546792 30687.75 8
## 12 Australia Oceania 2007 81.23500 20434176 34435.37 7
## 13 Australia Oceania 2022 74.66292 14649312 19980.60 4
## mean_bikes
## 8 2
## 9 2
## 10 2
## 11 3
## 12 2
## 13 4
Combining data frames by rows
dim(aust)## [1] 13 8
aust_double<-rbind(aust,aust)
dim(aust_double)## [1] 26 8
3.2.2 Removing columns and rows
aust<-aust[,-ncol(aust)]# remove the last column
head(aust)## country continent year lifeExp pop gdpPercap mean_children
## 1 Australia Oceania 1952 69.12 8691212 10039.60 8
## 2 Australia Oceania 1957 70.33 9712569 10949.65 7
## 3 Australia Oceania 1962 70.93 10794968 12217.23 1
## 4 Australia Oceania 1967 71.10 11872264 14526.12 2
## 5 Australia Oceania 1972 71.93 13177000 16788.63 4
## 6 Australia Oceania 1977 73.49 14074100 18334.20 7
aust<-aust[,colnames(aust)!="mean_children"]# remove column by name
head(aust)## country continent year lifeExp pop gdpPercap
## 1 Australia Oceania 1952 69.12 8691212 10039.60
## 2 Australia Oceania 1957 70.33 9712569 10949.65
## 3 Australia Oceania 1962 70.93 10794968 12217.23
## 4 Australia Oceania 1967 71.10 11872264 14526.12
## 5 Australia Oceania 1972 71.93 13177000 16788.63
## 6 Australia Oceania 1977 73.49 14074100 18334.20
dim(aust[-1,]) # Remove the first row## [1] 12 6
dim(aust[-1*1:10,]) # Remove the first 10 rows## [1] 3 6
3.2.3 Applying filters
aust[aust$lifeExp>=70,] ## country continent year lifeExp pop gdpPercap
## 2 Australia Oceania 1957 70.33000 9712569 10949.65
## 3 Australia Oceania 1962 70.93000 10794968 12217.23
## 4 Australia Oceania 1967 71.10000 11872264 14526.12
## 5 Australia Oceania 1972 71.93000 13177000 16788.63
## 6 Australia Oceania 1977 73.49000 14074100 18334.20
## 7 Australia Oceania 1982 74.74000 15184200 19477.01
## 8 Australia Oceania 1987 76.32000 16257249 21888.89
## 9 Australia Oceania 1992 77.56000 17481977 23424.77
## 10 Australia Oceania 1997 78.83000 18565243 26997.94
## 11 Australia Oceania 2002 80.37000 19546792 30687.75
## 12 Australia Oceania 2007 81.23500 20434176 34435.37
## 13 Australia Oceania 2022 74.66292 14649312 19980.60
aust[aust$gdpPercap>=mean(aust$gdpPercap),] ## country continent year lifeExp pop gdpPercap
## 8 Australia Oceania 1987 76.32000 16257249 21888.89
## 9 Australia Oceania 1992 77.56000 17481977 23424.77
## 10 Australia Oceania 1997 78.83000 18565243 26997.94
## 11 Australia Oceania 2002 80.37000 19546792 30687.75
## 12 Australia Oceania 2007 81.23500 20434176 34435.37
## 13 Australia Oceania 2022 74.66292 14649312 19980.60
How to get unique entries/remove duplicates
unique(aust_double)## country continent year lifeExp pop gdpPercap mean_children
## 1 Australia Oceania 1952 69.12000 8691212 10039.60 8
## 2 Australia Oceania 1957 70.33000 9712569 10949.65 7
## 3 Australia Oceania 1962 70.93000 10794968 12217.23 1
## 4 Australia Oceania 1967 71.10000 11872264 14526.12 2
## 5 Australia Oceania 1972 71.93000 13177000 16788.63 4
## 6 Australia Oceania 1977 73.49000 14074100 18334.20 7
## 7 Australia Oceania 1982 74.74000 15184200 19477.01 4
## 8 Australia Oceania 1987 76.32000 16257249 21888.89 5
## 9 Australia Oceania 1992 77.56000 17481977 23424.77 3
## 10 Australia Oceania 1997 78.83000 18565243 26997.94 2
## 11 Australia Oceania 2002 80.37000 19546792 30687.75 8
## 12 Australia Oceania 2007 81.23500 20434176 34435.37 7
## 13 Australia Oceania 2022 74.66292 14649312 19980.60 4
## mean_bikes
## 1 3
## 2 4
## 3 2
## 4 1
## 5 3
## 6 3
## 7 3
## 8 2
## 9 2
## 10 2
## 11 3
## 12 2
## 13 4
To remove empty rows
# First lets add an empty row
na.list<-rep(NA,ncol(aust))
aust<-rbind(aust,na.list)
tail(aust)## country continent year lifeExp pop gdpPercap
## 9 Australia Oceania 1992 77.56000 17481977 23424.77
## 10 Australia Oceania 1997 78.83000 18565243 26997.94
## 11 Australia Oceania 2002 80.37000 19546792 30687.75
## 12 Australia Oceania 2007 81.23500 20434176 34435.37
## 13 Australia Oceania 2022 74.66292 14649312 19980.60
## 14 <NA> <NA> NA NA NA NA
aust<-aust[!is.na(aust$country),]
tail(aust)## country continent year lifeExp pop gdpPercap
## 8 Australia Oceania 1987 76.32000 16257249 21888.89
## 9 Australia Oceania 1992 77.56000 17481977 23424.77
## 10 Australia Oceania 1997 78.83000 18565243 26997.94
## 11 Australia Oceania 2002 80.37000 19546792 30687.75
## 12 Australia Oceania 2007 81.23500 20434176 34435.37
## 13 Australia Oceania 2022 74.66292 14649312 19980.60
3.2.4 Editing specific elements
aust[1,"lifeExp"]<-aust[1,"lifeExp"]+1 3.3 Hands-on: basic data manipulation
- Write a data processing snippet to include only the data points collected after 1995 in Asian countries as a CSV file.
- Separate the
gapminderdata frame into 5 individual data frames, one for each continent. Store those 5 data frames as anRDatafile calledcontinents.RDatain theobjectsfolder.
- Finish exploring the
gapminderdata frame and:
- Find the number of rows and the number of columns
- Print the data type of each column
- Explain the meaning of everything that
str(gapminder)prints
- In which years has the GDP of Canada been larger than the average of all data points recorded for Canada?
- Find the mean life expectancy of Switzerland before and after 2000
- You discovered that all the entries from 2007 are actually from 2008. Create a copy of the full
gapminderdata frame in an object calledgp. Then change the year column to correct the entries from 2007. - Bonus - Find the mean life expectancy and mean gdp per continent using the function
tapply
4 Advanced data manipulation
Learning objectives
- Become familiar with the
dplyrsyntax - Create pipes with the operator %>%
- Perform operations on data frames using dplyr and tidyr functions
- Implement functions from other external packages
There are several packages that allow for more sophisticated processing operations to be done faster. We will take a look at some functions from one of them. I encourage you to look into plyr and tidyr after this workshop.
4.1 Manipulation with dplyr
We often need to select certain observations (rows) or variables (columns), or group the data by certain variable(s) to calculate some summary statistics. Although these operations can be done using base R functions, they require the creation of multiple intermediate objects and a lot of code repetition. There are two packages that provide functions to streamline common operations on tabular data and make the code look nicer and cleaner.
These packages are part of a broader family called tidyverse, for more information you can visit https://www.tidyverse.org/.
We will cover 5 of the most commonly used functions and combine them using pipes (%>%):
1. select() - used to extract data
2. filter() - to filter entries using logical vectors
3. group_by() - to solve the split-apply-combine problem
4. summarize() - to obtain summary statistics
5. mutate() - to create new columns
library(tidyr)4.1.1 Introducing pipes
gapminder %>%
head()## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
gapminder %>%
tail()## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Zimbabwe Africa 1982 60.4 7636524 789.
## 2 Zimbabwe Africa 1987 62.4 9216418 706.
## 3 Zimbabwe Africa 1992 60.4 10704340 693.
## 4 Zimbabwe Africa 1997 46.8 11404948 792.
## 5 Zimbabwe Africa 2002 40.0 11926563 672.
## 6 Zimbabwe Africa 2007 43.5 12311143 470.
4.1.2 Using select()
To subset a data frame
dplyr::select(.data = gapminder,
year, country, gdpPercap) %>%
head()## # A tibble: 6 × 3
## year country gdpPercap
## <int> <fct> <dbl>
## 1 1952 Afghanistan 779.
## 2 1957 Afghanistan 821.
## 3 1962 Afghanistan 853.
## 4 1967 Afghanistan 836.
## 5 1972 Afghanistan 740.
## 6 1977 Afghanistan 786.
To remove columns
dplyr::select(.data = gapminder,
-continent) %>%
head()## # A tibble: 6 × 5
## country year lifeExp pop gdpPercap
## <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan 1952 28.8 8425333 779.
## 2 Afghanistan 1957 30.3 9240934 821.
## 3 Afghanistan 1962 32.0 10267083 853.
## 4 Afghanistan 1967 34.0 11537966 836.
## 5 Afghanistan 1972 36.1 13079460 740.
## 6 Afghanistan 1977 38.4 14880372 786.
gapminder %>%
dplyr::select(year, country, gdpPercap) %>%
head()## # A tibble: 6 × 3
## year country gdpPercap
## <int> <fct> <dbl>
## 1 1952 Afghanistan 779.
## 2 1957 Afghanistan 821.
## 3 1962 Afghanistan 853.
## 4 1967 Afghanistan 836.
## 5 1972 Afghanistan 740.
## 6 1977 Afghanistan 786.
4.1.3 Using filter()
Include only European countries and select the columns year, country and gdpPercap
gapminder %>%
dplyr::filter(continent == "Europe") %>%
dplyr::select(year, country, gdpPercap) %>%
head()## # A tibble: 6 × 3
## year country gdpPercap
## <int> <fct> <dbl>
## 1 1952 Albania 1601.
## 2 1957 Albania 1942.
## 3 1962 Albania 2313.
## 4 1967 Albania 2760.
## 5 1972 Albania 3313.
## 6 1977 Albania 3533.
Using multiple filters at once
gapminder %>%
dplyr::filter(continent == "Europe", year == 2007) %>%
dplyr::select(country, lifeExp)## # A tibble: 30 × 2
## country lifeExp
## <fct> <dbl>
## 1 Albania 76.4
## 2 Austria 79.8
## 3 Belgium 79.4
## 4 Bosnia and Herzegovina 74.9
## 5 Bulgaria 73.0
## 6 Croatia 75.7
## 7 Czech Republic 76.5
## 8 Denmark 78.3
## 9 Finland 79.3
## 10 France 80.7
## # … with 20 more rows
Extract unique entries
gapminder %>%
dplyr::select(country, continent) %>%
dplyr::distinct()## # A tibble: 142 × 2
## country continent
## <fct> <fct>
## 1 Afghanistan Asia
## 2 Albania Europe
## 3 Algeria Africa
## 4 Angola Africa
## 5 Argentina Americas
## 6 Australia Oceania
## 7 Austria Europe
## 8 Bahrain Asia
## 9 Bangladesh Asia
## 10 Belgium Europe
## # … with 132 more rows
Order according to a column
gapminder %>%
dplyr::select(country, continent,year,pop) %>%
dplyr::arrange(desc(pop)) %>%
head()## # A tibble: 6 × 4
## country continent year pop
## <fct> <fct> <int> <int>
## 1 China Asia 2007 1318683096
## 2 China Asia 2002 1280400000
## 3 China Asia 1997 1230075000
## 4 China Asia 1992 1164970000
## 5 India Asia 2007 1110396331
## 6 China Asia 1987 1084035000
4.1.4 Using group_by()
It internally groups observations based on the specified variable(s)
str(gapminder)## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
str(gapminder %>% dplyr::group_by(continent))## grouped_df [1,704 × 6] (S3: grouped_df/tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
## - attr(*, "groups")= tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
## ..$ continent: Factor w/ 5 levels "Africa","Americas",..: 1 2 3 4 5
## ..$ .rows : list<int> [1:5]
## .. ..$ : int [1:624] 25 26 27 28 29 30 31 32 33 34 ...
## .. ..$ : int [1:300] 49 50 51 52 53 54 55 56 57 58 ...
## .. ..$ : int [1:396] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ : int [1:360] 13 14 15 16 17 18 19 20 21 22 ...
## .. ..$ : int [1:24] 61 62 63 64 65 66 67 68 69 70 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
4.1.5 Using summarize()
gdp_c <- gapminder %>%
dplyr::group_by(continent) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap))
gdp_c## # A tibble: 5 × 2
## continent mean_gdpPercap
## <fct> <dbl>
## 1 Africa 2194.
## 2 Americas 7136.
## 3 Asia 7902.
## 4 Europe 14469.
## 5 Oceania 18622.
Combine multiple summary statistics
gapminder %>%
dplyr::group_by(continent) %>%
dplyr::summarize(mean_le = mean(lifeExp),
min_le = min(lifeExp),
max_le = max(lifeExp),
se_le = sd(lifeExp)/sqrt(dplyr::n()))## # A tibble: 5 × 5
## continent mean_le min_le max_le se_le
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Africa 48.9 23.6 76.4 0.366
## 2 Americas 64.7 37.6 80.7 0.540
## 3 Asia 60.1 28.8 82.6 0.596
## 4 Europe 71.9 43.6 81.8 0.286
## 5 Oceania 74.3 69.1 81.2 0.775
4.1.6 Using mutate()
gapminder %>%
dplyr::mutate(gdp_billion = gdpPercap*pop/10^9)## # A tibble: 1,704 × 7
## country continent year lifeExp pop gdpPercap gdp_billion
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 6.57
## 2 Afghanistan Asia 1957 30.3 9240934 821. 7.59
## 3 Afghanistan Asia 1962 32.0 10267083 853. 8.76
## 4 Afghanistan Asia 1967 34.0 11537966 836. 9.65
## 5 Afghanistan Asia 1972 36.1 13079460 740. 9.68
## 6 Afghanistan Asia 1977 38.4 14880372 786. 11.7
## 7 Afghanistan Asia 1982 39.9 12881816 978. 12.6
## 8 Afghanistan Asia 1987 40.8 13867957 852. 11.8
## 9 Afghanistan Asia 1992 41.7 16317921 649. 10.6
## 10 Afghanistan Asia 1997 41.8 22227415 635. 14.1
## # … with 1,694 more rows
4.1.7 Putting them all together
gdp_pop_ext <-gapminder %>%
dplyr::mutate(gdp_billion = gdpPercap*pop/10^9) %>%
dplyr::group_by(continent,year) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap),
sd_gdpPercap = sd(gdpPercap),
mean_pop = mean(pop),
sd_pop = sd(pop),
mean_gdp_billion = mean(gdp_billion),
sd_gdp_billion = sd(gdp_billion)) ## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
4.2 Hands-on advanced data manipulation
- Write one command (it can span multiple lines) using pipes that will output a data frame that has only the columns
lifeExp,countryandyearfor the records before the year 2000 from African countries, but not for other Continents.
- Calculate the average life expectancy per country. Which country has the longest average life expectancy and which one the shortest average life expectancy?
- In the previous hands-on you discovered that all the entries from 2007 are actually from 2008. Write a command to edit the data accordingly using pipes. In the same command filter only the entries from 2008 to verify the change.
5 Generating visual outputs
5.1 Graphics with base R
hist(gapminder$lifeExp,xlab="Life expectancy")Arrange figures into multiple panels with par
df<-gapminder[gapminder$country=="Switzerland",]
par(mfrow=c(1,3))
plot(y = df$lifeExp,x=df$year,xlab="Years",ylab="Life expectancy")
plot(y = df$pop,x=df$year,xlab="Years",ylab="Population size")
plot(y = df$gdpPercap,x=df$year,xlab="Years",ylab="GDP per capita")df<-gapminder[gapminder$country=="Zimbabwe",]
par(mfrow=c(1,3))
plot(y = df$lifeExp,x=df$year,xlab="Years",ylab="Life expectancy")
plot(y = df$pop,x=df$year,xlab="Years",ylab="Population size")
plot(y = df$gdpPercap,x=df$year,xlab="Years",ylab="GDP per capita")5.2 Graphics with ggplot2
library(ggplot2)We can look at multiple countries at the same time in a prettier way
df<-gapminder %>%
dplyr::mutate(country = as.character(country)) %>%
dplyr::filter(country %in% c("Switzerland","Australia","Zimbabwe","India"))
ggplot(df,aes(x=year,y=lifeExp,color=country))+
geom_point()+
geom_line()ggplot(df,aes(x=year,y=gdpPercap,color=country))+
geom_point()+
geom_line()Now, let’s plot the mean GDP per-capita over time for each continent
gdp_c <- gapminder %>%
dplyr::group_by(continent,year) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap),
mean_le = mean(lifeExp),
min_le = min(lifeExp),
max_le = max(lifeExp),
se_le = sd(lifeExp)/sqrt(dplyr::n()))## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
head(gdp_c)## # A tibble: 6 × 7
## # Groups: continent [1]
## continent year mean_gdpPercap mean_le min_le max_le se_le
## <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Africa 1952 1253. 39.1 30 52.7 0.714
## 2 Africa 1957 1385. 41.3 31.6 58.1 0.779
## 3 Africa 1962 1598. 43.3 32.8 60.2 0.815
## 4 Africa 1967 2050. 45.3 34.1 61.6 0.844
## 5 Africa 1972 2340. 47.5 35.4 64.3 0.890
## 6 Africa 1977 2586. 49.6 36.8 67.1 0.944
ggplot(gdp_c,aes(x=year,y=mean_gdpPercap,color=continent))+
geom_point()+
geom_line()We can pipe objects directly into the ggplot() function:
gdp_c %>%
ggplot(aes(x=year,y=mean_gdpPercap,color=continent))+
geom_point()+
geom_line()And even do this:
gapminder %>%
dplyr::group_by(continent,year) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap)) %>%
ggplot(aes(x=year,y=mean_gdpPercap,color=continent))+
geom_point()+
geom_line()5.2.0.1 Exercise
Plot the life expectancy over time of all countries for the years with population size larger than 2+06
gapminder %>%
dplyr::filter(pop>=2e+06) %>%
ggplot(aes(x=year,y=gdpPercap,color=country))+
geom_point()+
geom_line()+
facet_wrap(~continent)+
theme(legend.position = "none")5.2.1 Some ggplot tricks
Make sure your data has in the write format (wide vs long). Usually, ggplot requires the data in long format. The functions tidyr::pivot_wider() and tidyr::pivot_longer() are very useful to transform one into the other.
?tidyr::pivot_wider()
?tidyr::pivot_longer()To change the order of colors, modify the factor levels
gapminder %>%
dplyr::group_by(continent,year) %>%
dplyr::mutate(continent = factor(as.character(continent),
levels = c("Oceania","Europe","Africa","Americas","Asia"))) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap)) %>%
ggplot(aes(x=year,y=mean_gdpPercap,color=continent))+
geom_point()+
geom_line()You can store the plots in an object and keep adding layers to it
p<-gapminder %>%
dplyr::group_by(continent,year) %>%
dplyr::mutate(continent = factor(as.character(continent),
levels = c("Oceania","Europe","Africa","Americas","Asia"))) %>%
dplyr::summarize(mean_gdpPercap = mean(gdpPercap)) %>%
ggplot(aes(x=year,y=mean_gdpPercap,color=continent))+
geom_point()+
geom_line()
# Change the color palette
p + scale_color_viridis_d(begin = 0.1,end=0.8)6 Real life application
- How many clinics participated in the study, and how many valid tests were performed on each one? Did the testing trend vary over time?
- How many patients tested positive vs negative in the first 100 days of the pandemic? Do you notice any difference with the age of the patients? Hint: You can make two age groups and calculate the percentage each age group in positive vs negative tests, try using the function
ifelse()to do this. - Look at the specimen processing time to receipt, did the sample processing times improve over the first 100 days of the pandemic? Plot the median processing times of each day over the course of the pandemic and then compare the summary statistics of the first 50 vs the last 50 days
- Bonus: Higher viral loads are detected in less PCR cycles. What can you observe about the viral load of positive vs negative samples. Do you notice anything differences in viral load across ages in the positive samples? Hint: Also split the data into two age groups and try using
geom_boxplot()
library(medicaldata)
covid<-covid_testing
dim(covid)## [1] 15524 17
7 Software development concepts
7.1 Good coding practices
7.1.1 Script structure
- Use comments to create sections.
- Load all required packages at the very beginning.
- Write all function definitions after package loading section or create a standalone file for your functions and call it in the main code.
7.1.2 Functions
Identify functions capitalizing the first letter of each word
# Good
DoNothing <- function() {
return(invisible(NULL))
}
# Bad
donothing <- function() {
return(invisible(NULL))
}Use explicit returns
# Good
AddValues <- function(x, y) {
return(x + y)
}
# Bad
AddValues <- function(x, y) {
x + y
}Define what the functions does, the input parameters, and output using comments inside the function
AddValues <- function(x, y) {
# Description: Function to add to numeric variables
# Input
# x = numeric
# y = numeric
# Output: numeric
return(x + y)
}Testing and documenting
- Use formal documentation for functions whenever you are writing more complicated projects. This documentation is written in separate
.Rdfiles,and it turns into the documentation printed in the help files. - The
roxygen2package allows R coders to write documentation alongside the function code and then process it into the appropriate.Rdfiles. - Formal automated tests can be written using the
testthatpackage.
7.1.3 External packages
Packages are essentially bundles of functions with formal documentation. Loading your own functions through
source("functions.R")is similar to loading someone else’s usinglibrary("package")As a general rule, only load a package using
library()if you are going to use more than two functions from if.Use the name space when calling an external function. Not doing it can cause clashes when two packages have a function with the same name.
# Good
purrr::map()
# Bad
map()7.2 Debugging and troubleshooting
General advice:
- Create a minimal reproducible example of your error.
- Whenever you see an error copy the full message and paste it in the search bar on your web browser. There is a lot of support out there, and most likely someone came across that same error before.